查看hadoop运行状态怎么看hadoop环境配置成功没

您所在的位置：网站首页 › 查看hadoop namenode 状态 › 查看hadoop运行状态怎么看hadoop环境配置成功没

查看hadoop运行状态怎么看hadoop环境配置成功没

2024-07-01 22:50| 来源: 网络整理| 查看: 265

下载 Hadoop

查看hadoop运行状态怎么看hadoop环境配置成功没_hadoop

wget https://dlcdn.apache.org/hadoop/common/hadoop-3.2.3/hadoop-3.2.3.tar.gz --no-check-certificate

配置环境变量

首先修改当前用户的配置文件，添加 Hadoop 环境变量。修改 ~/.bashrc

查看hadoop运行状态怎么看hadoop环境配置成功没_shell_02

Hadoop 解压后即可使用。通过在任意路径下，使用 hadoop version 查看 Hadoop 版本，来判断是否配置成功

查看hadoop运行状态怎么看hadoop环境配置成功没_网络_03

启动与停止

启动 namenode、datanode、resourcemanager

hdfs --daemon start namenode hdfs --daemon start datanode yarn --daemon start nodemanager yarn --daemon start resourcemanager

查看hadoop运行状态怎么看hadoop环境配置成功没_查看hadoop运行状态_04

停止 namenode、datanode、resourcemanager

hdfs --daemon stop namenode hdfs --daemon stop datanode yarn --daemon stop nodemanager yarn --daemon stop resourcemanager

查看hadoop运行状态怎么看hadoop环境配置成功没_查看hadoop运行状态_05

启动 namenode 时，可能会报错：ERROR: Cannot set priority of namenode process 9303

通过日志，可以查看到问题：tail -n 20 hadoop-3.2.3/logs/hadoop-root-namenode-starrocks1.logtail -n 表示查看最后几行

查看hadoop运行状态怎么看hadoop环境配置成功没_centos_06

查看到问题原因是 Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority在hadoop的官网可以找到 fs.defaultFS 是 core-site.xml 配置文件中的属性

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml

查看hadoop运行状态怎么看hadoop环境配置成功没_shell_07

使用命令修改 core-site.xml 修改配置 vim hadoop-3.2.3/etc/hadoop/core-site.xml

fs.defaultFS hdfs://localhost:9000

使用 jps 发现 namenode 并未重启，再次查看日志 tail -n 20 hadoop-3.2.3/logs/hadoop-root-namenode-starrocks1.log 报错： Directory /tmp/hadoop-root/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.

查看hadoop运行状态怎么看hadoop环境配置成功没_centos_08

centos在每次关机后，都会自动清理 tmp 目录下的文件，这就意味着之前记录的 name 等信息有可能会丢失。这时候再想修复，就只能使用 bin/hadoop namenode -format格式化整个名字节点的数据。因此，这里在启动的时候，就提示你应该将 hadoop.tmp.dir 设置到一个不会更变的目录

查看hadoop运行状态怎么看hadoop环境配置成功没_shell_09

使用命令修改 core-site.xml 修改配置 vim hadoop-3.2.3/etc/hadoop/core-site.xml

fs.defaultFS hdfs://localhost:9000 hadoop.tmp.dir /root/hadoop-3.2.3/tmp/

由于报错：Directory /root/hadoop-3.2.3/tmp/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible 需要创建目录 mkdir -p /root/hadoop-3.2.3/tmp/dfs/name，mkdir -p 表示递归创建目录

查看hadoop运行状态怎么看hadoop环境配置成功没_centos_10

如果报错NameNode is not formatted，就需要格式化名字节点 hadoop-3.2.3/bin/hdfs namenode -format

查看hadoop运行状态怎么看hadoop环境配置成功没_网络_11

WEB 管理页hdfs健康信息 http://localhost:9870 hadoop集群信息 http://127.0.0.1:8088/

如果访问不了，首先用 jps 查看4个相关进程是否存在。其次检查防火墙是否关闭，关闭命令 systemctl stop firewalld

查看hadoop运行状态怎么看hadoop环境配置成功没_centos_12

查看hadoop运行状态怎么看hadoop环境配置成功没_hadoop_13

It looks like you are making an HTTP request to a Hadoop IPC port. This is not the correct port for the web interface on this daemon.

如果出现这段英文，这说明你的配置包括进程启动都没问题，你只是访问了进程中的非web端口

查看hadoop运行状态怎么看hadoop环境配置成功没_查看hadoop运行状态_14

Hadoop单机配置（非分布式）

Hadoop 默认模式为非分布式模式（本地模式），无需任何配置即可运行。通过以下命令，可以执行hadoop 自带的 demo。

这种模式在一台单机上运行，没有分布式文件系统，而是直接读写本地操作系统的文件系统，一般仅用于本地MR程序的调试

使用如下命令可以查看可用demo

cd ${HADOOP} hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar

查看hadoop运行状态怎么看hadoop环境配置成功没_hadoop_15

可以看到 hadoop 自带了很多 example

An example program must be given as the first argument. Valid program names are:

aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.dbcount: An example job that count the pageview counts from a database.distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.grep: A map/reduce program that counts the matches of a regex in the input.join: A job that effects a join over sorted, equally partitioned datasetsmultifilewc: A job that counts words from several files.pentomino: A map/reduce tile laying program to find solutions to pentomino problems.pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.randomwriter: A map/reduce program that writes 10GB of random data per node.secondarysort: An example defining a secondary sort to the reduce.sort: A map/reduce program that sorts the data written by the random writer.sudoku: A sudoku solver.teragen: Generate data for the terasortterasort: Run the terasortteravalidate: Checking results of terasortwordcount: A map/reduce program that counts the words in the input files.wordmean: A map/reduce program that counts the average length of the words in the input files.wordmedian: A map/reduce program that counts the median length of the words in the input files.wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

下面测试 grep 小程序

cd $HADOOP mkdir input cp etc/hadoop/*.xml input bin/hadoop jar share/hadoopmapreduce/hadoop-mapreduce-examples-3.3.0.jar grep input output 'hadoop.*'

查看hadoop运行状态怎么看hadoop环境配置成功没_hadoop_16

jar 表示使用hadoop执行一个jar脚本

查看hadoop运行状态怎么看hadoop环境配置成功没_hadoop_17

查看执行结果

_SUCCESS 是一个空文件，标志执行成功part-r-00000 保存了执行结果Hadoop伪分布式配置

这种模式也是在一台单机上运行，但用不同的Java进程模仿分布式运行中的各类结点 (NameNode, DataNode, JobTracker, TaskTracker, SecondaryNameNode) 　　请注意分布式运行中的这几个结点的区别：

从分布式存储的角度来说，集群中的结点由一个NameNode和若干个DataNode组成,另有一个SecondaryNameNode作为NameNode的备份。从分布式应用的角度来说，集群中的结点由一个JobTracker和若干个TaskTracker组成，JobTracker负责任务的调度，TaskTracker负责并行执行任务。TaskTracker必须运行在DataNode上，这样便于数据的本地计算。JobTracker和NameNode则无须在同一台机器上。一个机器上，既当namenode，又当datanode,或者说既是jobtracker,又是tasktracker。没有所谓的在多台机器上进行真正的分布式计算，故称为"伪分布式"。开启多个进程模拟完全分布式，但是并没有真正提高程序执行的效率

如果像单机模式一样直接启动，会报错 hdfs://localhost:9000 连接不上，解决办法是启动 namenode 和 datanode

查看hadoop运行状态怎么看hadoop环境配置成功没_hadoop_18

在未做任何配置的情况下，namenode是无法直接启动的，会报 Error: Cannot set priority of namenode process 57675 的错误（datanode到是可以直接启动的）

查看hadoop运行状态怎么看hadoop环境配置成功没_centos_19

需要修改 Hadoop 的配置文件，位置在 $HADOOP/etc/hadoop/ 目录下。要操作如下3个配置文件：

core-site.xml（Hadoop集群的特性，作用于全部进程及客户端）hdfs-site.xml（配置HDFS集群的工作属性）mapred-site.xml（配置MapReduce集群的属性）

查看hadoop运行状态怎么看hadoop环境配置成功没_查看hadoop运行状态_20

etc/hadoop/core-site.xml 是必须修改的。要添加 hadoop.tmp.dir 和 fs.defaultFS 属性。 hadoop.tmp.dir /home/chen/.hadoop/tmp a temporary directory for hadoop fs.defaultFS hdfs://localhost:9000 /etc/hadoop/hdfs-site.xml（非必须，可不修改，不影响运行）这里将副本数量设置为1 dfs.replication 1 /etc/hadoop/yarn-site.xml（非必须，可不修改，不影响运行） yarn.nodemanager.aux-services mapreduce_shuffle yarn.log-aggregation-enable true

修改后启动 namenode 和 datanode。但依旧会报错，因为 hdfs://localhost:9000/user//input 文件夹不存在

查看hadoop运行状态怎么看hadoop环境配置成功没_shell_21